Start with R Day 1

Daniel Parthier

Dec 13, 2022

What is R?

  • Data wrangling
  • Statistical analysis
  • Time series analysis
  • Make maps
  • Analyse images

How can it be used?

  • In console: R

  • in RStudio IDE

First things first

  1. Install R https://cran.r-project.org/
  1. Then install RStudio https://posit.co/

Alternatively

Some of you might have access to the Schmitzlab BCCN server and use RStudio in the browser on the server

just ask for the link

Getting started

You can start right away!

Output <- 1+1
Output
[1] 2
  • Assign numbers to a variable Output and return it
  • Variables can also be “words” (referred to as strings or character)
Output <- "I am not a number"
Output
[1] "I am not a number"

Tips for naming “things”

But the worst enemy you can meet will always be yourself.

Friedrich Nietzsche


  • Think about your future self

Tips for naming “things”

x <- 5
y <- 1.2
z <- y/sqrt(x)
  • What are we trying to do?
N_cells <- 5
SD_cells <- 1.2
SEM_cells <- SD_cells/sqrt(N_cells)
  • Be clear and explicit in what variables/functions mean
  • Snake case: SD_cells
  • Camel case: sdCells
  • Upper camel case: SdCells

Things you might see and adopt

  • Often variables are abbreviated
    • parameter as par

Things you might see and adopt

  • Often variables are abbreviated
  • Some are associated with iterations in loops
    • i, n
    • write “something” for 100 times
for(i in 1:100) {
  # start of loop (i is set to element "in" vector)
  print("something")
  # end of loop --> go back to start with new element in vector
}

Things you might see and adopt

  • Often variables are abbreviated:
  • Some are associated with iterations in loops
  • Pipes: |> or %>% which are a bit like a water slide

Things you might see and adopt

  • Often variables are abbreviated:
  • Some are associated with iterations in loops
  • Pipes: |> or %>% which are a bit like a water slide
    • You jump (|>) in there → ()
SumOfTwo <- 1+1

SumOfTwo |>
  sqrt()
[1] 1.414214

The building blocks1

  • Numbers:
    • integer (1L, 1:10)
    • double (1.1, pi)
    • complex (1+2i)
  • How can I check?
One <- 1
typeof(One)
[1] "double"
is.double(One) # same pattern for other types
[1] TRUE

Integers

  • Useful for when you want to be sure it is not a double
  • Counting discrete values
4L + 1L
[1] 5
  • Indicating a position (indexing)
letters[4L]
[1] "d"
  • When indexing R will convert your input to integers if possible

Double

  • All numbers which are not exclusively integers
  • Used for most calculations
sqrt(2)
[1] 1.414214
  • Also 🥧 (\(\pi\))
pi
[1] 3.141593

Complex

  • Rarely needed but will come up when using fourier transform etc.
complex(real = 1, imaginary = 2)
[1] 1+2i


1 + 2i
[1] 1+2i

The building blocks

  • Not numbers:
    • logical (TRUE, FALSE)
    • character ("hello", LETTERS)
    • factor (factor(x = LETTERS))

Logical

  • Logical expressions (binary: yes/no, 1/0, true/false)
4 < 5
[1] TRUE


is.numeric("I am a number!")
[1] FALSE


TRUE == FALSE
[1] FALSE


TRUE == T
[1] TRUE

The building blocks

  • Organised structures:
    • vector
    • matrix/array
    • list
    • data.frame/tibble/data.table

Vectors

  • Simplest structure is the vector
    • Chain of single elements (of a single type)
1:10
 [1]  1  2  3  4  5  6  7  8  9 10
letters
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
[20] "t" "u" "v" "w" "x" "y" "z"
factor(letters)
 [1] a b c d e f g h i j k l m n o p q r s t u v w x y z
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
  • Vectors have a defined length
length(letters)
[1] 26

Accessing Vectors

  • Single elements or a group can be accessed by “indexing”
    • Index: position of element in vector
letters[1]
[1] "a"
letters[4:6]
[1] "d" "e" "f"

Accessing Vectors

  • Access parts with functions
head(letters)
[1] "a" "b" "c" "d" "e" "f"


?head()


Return the First or Last Parts of an Object

Returns the first or last parts of a vector, matrix, table, data frame or function. Since head() and tail() are generic functions, they may also have been extended to other classes.


tail(letters)
[1] "u" "v" "w" "x" "y" "z"

Matrices

A matrix is a vector with 2 dimensions

matrix(data = 1:4, nrow = 2, ncol = 2)
     [,1] [,2]
[1,]    1    3
[2,]    2    4



How they are sorted is up to you

matrix(data = 1:4, nrow = 2, ncol = 2, byrow = T)
     [,1] [,2]
[1,]    1    2
[2,]    3    4

Lists

A list can store different types of data with variable length

list(Alphabet = LETTERS, Numbers = 1:100)
$Alphabet
 [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
[20] "T" "U" "V" "W" "X" "Y" "Z"

$Numbers
  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
 [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
 [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
 [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
 [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
 [91]  91  92  93  94  95  96  97  98  99 100

Lists

Elements in a list can be accessed with $

MyList <- list(Alphabet = LETTERS, Numbers = 1:100)
MyList$Alphabet
 [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
[20] "T" "U" "V" "W" "X" "Y" "Z"

Or by indexing (nested brackets! [[i]])

MyList[[1]]
 [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
[20] "T" "U" "V" "W" "X" "Y" "Z"

Data Frames

Data frames are lists where all elements have the same length

DF <- data.frame(Alphabet = LETTERS[1:10], Numbers = 1:10)
DF
   Alphabet Numbers
1         A       1
2         B       2
3         C       3
4         D       4
5         E       5
6         F       6
7         G       7
8         H       8
9         I       9
10        J      10

Data Frames

Access with $

DF$Alphabet
 [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J"

Or by indexing ([row, column])

DF[,1]
 [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J"


DF[1:2,]
  Alphabet Numbers
1        A       1
2        B       2

Functions

Functions take an input and can return an output

SimData <- rnorm(n = 100, mean = 42, sd = 2)
head(SimData)
[1] 41.66084 45.13191 41.75570 38.12088 42.38217 43.05238


Summary function:

summary(SimData)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  36.95   40.04   41.96   41.73   43.11   45.75 


Standard deviation:

sd(SimData)
[1] 2.017271

Functions

If you want/have to then you can write your own functions

MySummaryFunction <- function(x) {
  if(length(x)>1) {
    # do something here when x longer than 1
  } else {
    print("Input is too short!")
  }
}

Functions

If you want/have to then you can write your own functions

MySummaryFunction <- function(x) {
  if(length(x)>1) {
    OutputDF <- data.frame(N = length(x),
                           Mean = mean(x),
                           Median = median(x),
                           SD = sd(x),
                           SEM = sd(x)/sqrt(length(x))
    )
  } else {
    print("Input is too short!")
  }
}

Functions

If you want/have to then you can write your own functions

MySummaryFunction <- function(x) {
  if(length(x)>1) {
    OutputDF <- data.frame(N = length(x),
                           Mean = mean(x),
                           Median = median(x),
                           SD = sd(x),
                           SEM = sd(x)/sqrt(length(x))
    )
    return(OutputDF)
  } else {
    print("Input is too short!")
  }
}

Functions

If you want/have to then you can write your own functions

MySummaryFunction(1)
[1] "Input is too short!"


MySummaryFunction(SimData)
    N     Mean   Median       SD       SEM
1 100 41.72516 41.95909 2.017271 0.2017271

Useful Functions

Make a sequence:

seq(from = 0, to = 1, by = 0.1)
 [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
seq(from = 0, to = 1, length.out = 5)
[1] 0.00 0.25 0.50 0.75 1.00

Useful Functions

Make a repeats:

rep(x = c("A", "B"), each = 3)
[1] "A" "A" "A" "B" "B" "B"
rep(x = c("A", "B"), times = 3)
[1] "A" "B" "A" "B" "A" "B"
rep_len(x = "A", length.out = 5)
[1] "A" "A" "A" "A" "A"

Useful Functions

Simulate!

Simulations can be very useful to check analysis or to get a feeling for the data (Or you have a simulation in mind)

rnorm(n = 4, mean = 0, sd = 1)
[1] -0.3342374 -0.6652091 -0.1018155 -0.5427938
rbinom(n = 10, size = 1, prob = 0.2)
 [1] 0 0 0 0 0 0 1 1 0 0

For more distributions check ?Distributions

Useful FUnctions

Can we apply a function to a all rows or columns of a matrix?

Use apply!

TestMatrix <- matrix(data = rnorm(n = 100), nrow = 10, ncol = 10)
apply(X = TestMatrix, MARGIN = 1, FUN = mean)
 [1] 0.39425006 0.21709478 0.15165701 0.03529147 0.26592828 0.31293525
 [7] 0.14851473 0.10812630 0.07072715 0.88166842
  • MARGIN: use function on rows (1) or columns (2)
  • apply will try to “simplify” output by default (as 1D vector or 2D matrix)
  • Otherwise output will be a list

Useful Functions

Other apply functions:

  1. lapply: uses list or vector → outputs list
  2. sapply: uses vectors → outputs vector, matrix or list
lapply(X = 1:5, FUN = function(x) {
  rnorm(n = x, mean = 0, sd = 1)
})
[[1]]
[1] 0.1465591

[[2]]
[1]  0.6826052 -0.3955907

[[3]]
[1] -0.7743689  0.5766927 -1.1109773

[[4]]
[1] -0.06860697  0.48722224  0.28141192  2.40478590

[[5]]
[1] -0.05428898  0.19428888  1.80600519  0.07745973  0.22978090

Useful Functions

Find files in folders with list.files

list.files(path = "Data/FileDirectory")
 [1] "DataFile1.csv" "DataFile2.csv" "DataFile3.csv" "DataFile4.csv"
 [5] "DataFile5.csv" "File1.csv"     "File2.csv"     "File3.csv"    
 [9] "File4.csv"     "File5.csv"     "SubDirectory" 

Output the whole folder

list.files(path = "Data/FileDirectory", full.names = T)
 [1] "Data/FileDirectory/DataFile1.csv" "Data/FileDirectory/DataFile2.csv"
 [3] "Data/FileDirectory/DataFile3.csv" "Data/FileDirectory/DataFile4.csv"
 [5] "Data/FileDirectory/DataFile5.csv" "Data/FileDirectory/File1.csv"    
 [7] "Data/FileDirectory/File2.csv"     "Data/FileDirectory/File3.csv"    
 [9] "Data/FileDirectory/File4.csv"     "Data/FileDirectory/File5.csv"    
[11] "Data/FileDirectory/SubDirectory" 

Useful Functions

Find files in folders with list.files

Output only specific files with "Data" in the name

list.files(path = "Data/FileDirectory", pattern = "Data", full.names = T)
[1] "Data/FileDirectory/DataFile1.csv" "Data/FileDirectory/DataFile2.csv"
[3] "Data/FileDirectory/DataFile3.csv" "Data/FileDirectory/DataFile4.csv"
[5] "Data/FileDirectory/DataFile5.csv"

Output files also in folders inside the folder

list.files(path = "Data/FileDirectory", recursive = T, full.names = T)
 [1] "Data/FileDirectory/DataFile1.csv"               
 [2] "Data/FileDirectory/DataFile2.csv"               
 [3] "Data/FileDirectory/DataFile3.csv"               
 [4] "Data/FileDirectory/DataFile4.csv"               
 [5] "Data/FileDirectory/DataFile5.csv"               
 [6] "Data/FileDirectory/File1.csv"                   
 [7] "Data/FileDirectory/File2.csv"                   
 [8] "Data/FileDirectory/File3.csv"                   
 [9] "Data/FileDirectory/File4.csv"                   
[10] "Data/FileDirectory/File5.csv"                   
[11] "Data/FileDirectory/SubDirectory/SubDirFile1.csv"
[12] "Data/FileDirectory/SubDirectory/SubDirFile2.csv"
[13] "Data/FileDirectory/SubDirectory/SubDirFile3.csv"

Useful Concepts

  • Wide vs Flat format (more on that later)
  • Formulas:
    • A variable Measurement depends on (~) Group (Measurement ~ Group)
    • Measurement depends on Group and Time (Measurement ~ Group + Time)
    • Measurement depends on Group and Time and its interaction (Measurement ~ Group*Time)
      • Measurement ~ Group + Time + Group:Time
    • Formulas are also sometimes used to describe a data structure (more on that later)

Analyse and communicate your data

  • Make a presentation like this here in R using data and your workflow
  • Write a report (pdf, html)
  • Export as R Notebook or Jupyternotebook
  • Write your whole paper or thesis with citations, figures etc. (rticle,bookdown)

Why should you use a script?

  • Every time you write R code you can reuse it

  • Just update the data set and you will get the whole analysis in seconds and reproducible

  • Make graphs which you don’t have to modify in illustrator etc.

    • Just change whatever you want in the code (size, colour, layout etc.)

Get organised

  • A project should start with a New Project

  • Keep track of changes with version control (Git) in RStudio

    • Just link your project to Git and keep track2
    • Share your workflow with colleagues or work together

Can I now do everything? (Packages)

It depends…

  • You might need to install the right tool: package

Can I now do everything? (Packages)

It depends…

  • You might need to install the right tool: package

Can I now do everything? (Packages)

It depends…

  • You might need to install the right tool: package
install.packages("readxl")
  • Now install ggplot2 and data.table

How do I know which Package I should install?

  • List of useful packages on RStudio

  • Approach to find packages: Wendt and Anderson (2022)

  • Most things you can do with what is there

  • If you can formulate your problem you will find a solution online with the needed package

Quick Analysis

Get the data sets and the code for the examples today:

download.file(url = "https://github.com/danielparthier/WorkshopSchmitzLabExamples/archive/main.zip"
              ,destfile = "WorkshopRExamples.zip")
unzip(zipfile = "WorkshopRExamples.zip")

Quick Analysis

  • First steps to analyse data?
    • Load the data!
    • You can just import your data from an excel sheet or text file.

Load Data

Load Data

Load Data

Load Data

Load Data

Load Data

  • Or in a line of code
library(readxl)

Load Data

  • Or in a line of code
library(readxl)
DataPPR <- read_excel("Data/DataPPR.xlsx")

The string (in quotation marks "...") is the file path

  • you can move directory levels using tab

Load Data

  • Or in a line of code
library(readxl)
DataPPR <- read_excel("Data/DataPPR.xlsx")
DataPPR
# A tibble: 31 × 5
   Slice Group Pulse1 Pulse2 EventsAfterEPSP
   <dbl> <chr>  <dbl>  <dbl>           <dbl>
 1     1 WT      1.42   2.7                1
 2     2 WT      0.78   0.87               2
 3     3 WT      0.96   1.64               2
 4     4 WT      0.64   1.35               2
 5     5 WT      0.92   1.88               2
 6     6 WT      0.76   1.21               1
 7     7 WT      1.67   3.09               0
 8     8 WT      0.93   0.89               1
 9     9 WT      0.64   1.58               3
10    10 WT      0.57   1.25               2
# … with 21 more rows

Look at data summary

summary(DataPPR)
     Slice         Group               Pulse1          Pulse2     
 Min.   : 1.0   Length:31          Min.   :0.480   Min.   :0.470  
 1st Qu.: 8.5   Class :character   1st Qu.:0.755   1st Qu.:1.160  
 Median :16.0   Mode  :character   Median :0.960   Median :1.450  
 Mean   :16.0                      Mean   :1.014   Mean   :1.692  
 3rd Qu.:23.5                      3rd Qu.:1.255   3rd Qu.:2.155  
 Max.   :31.0                      Max.   :1.670   Max.   :3.220  
 EventsAfterEPSP
 Min.   :0.000  
 1st Qu.:0.500  
 Median :2.000  
 Mean   :1.516  
 3rd Qu.:2.000  
 Max.   :4.000  

Quick base R plot

plot(DataPPR)

Different way of plotting

  • Using the “Grammar of Graphics”
  • Build figures based on layers and modules

What are we plotting?

library(data.table)
setDT(DataPPR) # or reassign using as.data.table()
DataPPR
    Slice Group Pulse1 Pulse2 EventsAfterEPSP
 1:     1    WT   1.42   2.70               1
 2:     2    WT   0.78   0.87               2
 3:     3    WT   0.96   1.64               2
 4:     4    WT   0.64   1.35               2
 5:     5    WT   0.92   1.88               2
 6:     6    WT   0.76   1.21               1
 7:     7    WT   1.67   3.09               0
 8:     8    WT   0.93   0.89               1
 9:     9    WT   0.64   1.58               3
10:    10    WT   0.57   1.25               2
11:    11    WT   1.40   2.81               2
12:    12    WT   1.28   2.67               0
13:    13    WT   1.24   2.19               4
14:    14    WT   1.10   2.12               1
15:    15    WT   1.27   3.22               2
16:    16    KO   0.72   1.14               0
17:    17    KO   0.86   0.47               1
18:    18    KO   1.64   2.78               4
19:    19    KO   1.56   2.12               4
20:    20    KO   1.06   1.52               2
21:    21    KO   1.05   1.35               0
22:    22    KO   0.83   1.40               0
23:    23    KO   0.48   0.65               2
24:    24    KO   0.68   1.00               2
25:    25    KO   1.10   1.45               0
26:    26    KO   1.18   2.28               0
27:    27    KO   1.30   2.05               2
28:    28    KO   0.77   1.14               2
29:    29    KO   0.75   1.39               1
30:    30    KO   0.72   1.06               2
31:    31    KO   1.14   1.18               0
    Slice Group Pulse1 Pulse2 EventsAfterEPSP

Plotting data

library(ggplot2) # main plotting library
library(ggbeeswarm) # adds the beeswarm to ggplot2
  • We don’t have to load data.table again because we still have it active from before

Short intro to data.table:



DT[i,j,by]



  • on which rows?
  • what to do?
  • grouped by what?
  • calculate the mean/sum/sd etc.
    • for all groups or combination of groups separately
  • assign them to new column with :=

Example data.table

ExampleDT <- data.table(Numbers = 1:100,
                        Groups = c("A", "B", "C", "D"),
                        Treatment = c("yes", "no"))
print(ExampleDT, topn = 5)
     Numbers Groups Treatment
  1:       1      A       yes
  2:       2      B        no
  3:       3      C       yes
  4:       4      D        no
  5:       5      A       yes
 ---                         
 96:      96      D        no
 97:      97      A       yes
 98:      98      B        no
 99:      99      C       yes
100:     100      D        no

Example data.table

ExampleDT[, mean(Numbers), by = Treatment]
   Treatment V1
1:       yes 50
2:        no 51
  • Calculate mean (V1) by treatment variable


ExampleDT[, .(Mean=mean(Numbers)), by = .(Groups, Treatment)]
   Groups Treatment Mean
1:      A       yes   49
2:      B        no   50
3:      C       yes   51
4:      D        no   52
  • Return with “name” requires to return a “list”
  • Combination can be given as “list”
  • .() is the same as list()

Plotting Data

DataPPR[, PPR:=Pulse2/Pulse1,]
print(DataPPR, topn=2)
    Slice Group Pulse1 Pulse2 EventsAfterEPSP      PPR
 1:     1    WT   1.42   2.70               1 1.901408
 2:     2    WT   0.78   0.87               2 1.115385
---                                                   
30:    30    KO   0.72   1.06               2 1.472222
31:    31    KO   1.14   1.18               0 1.035088
  • We don’t have to load data.table again because we still have it active from before
  • := means write/add column3

Plotting Data

ggplot(data = DataPPR, aes(x = Group, y = PPR))+
  geom_beeswarm()

Plotting Data

ggplot(data = DataPPR, aes(x = Group, y = PPR, colour = Group))+
  geom_beeswarm()

Plotting Data

ggplot(data = DataPPR, aes(x = Group, y = PPR, colour = Group))+
  geom_beeswarm(size=4, alpha=0.5, cex = 6, priority = "ascending")

Plotting Data

ggplot(data = DataPPR, aes(x = Group, y = PPR, colour = Group))+
  geom_beeswarm(size=4, alpha=0.5, cex = 6, priority = "ascending")+
  scale_y_continuous(name = "PPR")+
  scale_x_discrete(name = "")+
  scale_colour_manual(values = c("WT" = "black", "KO" = "red"), name = "")+
  theme_classic()+
  theme(legend.position = "None")

Sort and Clean

ggplot(data = DataPPR, aes(x = Group, y = PPR, colour = Group))+
  geom_beeswarm(size=4, alpha=0.5, cex = 6)+
  scale_y_continuous(name = "PPR")+
  scale_x_discrete(name = "", position = "top")+
  scale_colour_manual(values = c("WT" = "black", "KO" = "red"), name = "")+
  theme_classic()+
  theme(legend.position = "None", axis.line.x = element_blank(),
        axis.ticks.x = element_blank())

How to Arrange the Data?

Wide:

Slice Group Pulse1 Pulse2
1 WT 1.42 2.70
2 WT 0.78 0.87
3 WT 0.96 1.64
4 WT 0.64 1.35
5 WT 0.92 1.88

Long:

Slice Group Pulse Amplitude
1 WT Pulse1 1.42
2 WT Pulse1 0.78
3 WT Pulse1 0.96
4 WT Pulse1 0.64
5 WT Pulse1 0.92
1 WT Pulse2 2.70
2 WT Pulse2 0.87
3 WT Pulse2 1.64
4 WT Pulse2 1.35
5 WT Pulse2 1.88

How to Arrange the Data?

Wide:

Slice Group Pulse1 Pulse2
1 WT 1.42 2.70
2 WT 0.78 0.87
3 WT 0.96 1.64
4 WT 0.64 1.35
5 WT 0.92 1.88
  • Pulse is variable name
  • Amplitude is the value (same unit)

Long:

Slice Group Pulse Amplitude
1 WT Pulse1 1.42
2 WT Pulse1 0.78
3 WT Pulse1 0.96
4 WT Pulse1 0.64
5 WT Pulse1 0.92
1 WT Pulse2 2.70
2 WT Pulse2 0.87
3 WT Pulse2 1.64
4 WT Pulse2 1.35
5 WT Pulse2 1.88

What are We Plotting?

DT <- melt.data.table(data = DataPPR,
                      id.vars = c("Slice", "Group"),
                      measure.vars = c("Pulse1", "Pulse2"),
                      variable.name = "Pulse",
                      value.name = "Amplitude")
  • Reshape the data without rewriting and copying data by hand

DataPPR:

Slice Group Pulse1 Pulse2
1 WT 1.42 2.70
2 WT 0.78 0.87
3 WT 0.96 1.64
4 WT 0.64 1.35
5 WT 0.92 1.88

DT:

Slice Group Pulse Amplitude
1 WT Pulse1 1.42
2 WT Pulse1 0.78
3 WT Pulse1 0.96
4 WT Pulse1 0.64
5 WT Pulse1 0.92
1 WT Pulse2 2.70
2 WT Pulse2 0.87
3 WT Pulse2 1.64
4 WT Pulse2 1.35
5 WT Pulse2 1.88

What are We Plotting?

DT <- melt.data.table(data = DataPPR,
                      id.vars = c("Slice", "Group"),
                      measure.vars = c("Pulse1", "Pulse2"),
                      variable.name = "Pulse",
                      value.name = "Amplitude")
DT[,Pulse:=gsub(pattern = "Pulse", replacement = "", x = Pulse),]
  • Change or add columns

  • gsub(pattern = "Pulse", replacement = "", x = "Pulse1")"1"

    • replace parts of a string with another string (can be empty)

What are We Plotting?

DT <- melt.data.table(data = DataPPR,
                      id.vars = c("Slice", "Group"),
                      measure.vars = c("Pulse1", "Pulse2"),
                      variable.name = "Pulse",
                      value.name = "Amplitude")
DT[,Pulse:=gsub(pattern = "Pulse", replacement = "", x = Pulse),]


    Slice Group Pulse Amplitude
 1:     1    WT     1      1.42
 2:     2    WT     1      0.78
 3:     3    WT     1      0.96
 4:     4    WT     1      0.64
 5:     5    WT     1      0.92
---                            
58:    27    KO     2      2.05
59:    28    KO     2      1.14
60:    29    KO     2      1.39
61:    30    KO     2      1.06
62:    31    KO     2      1.18

Grammar of Graphics

ggplot(data = DT, mapping = aes(x = Pulse,
                                y = Amplitude,
                                group = Slice,
                                colour = Group))+
  geom_point()

Grammar of Graphics

ggplot(data = DT, mapping = aes(x = Pulse,
                                y = Amplitude,
                                group = Slice,
                                colour = Group))+
  geom_point(alpha=0.5, size=4)

Grammar of Graphics

ggplot(data = DT, mapping = aes(x = Pulse,
                                y = Amplitude,
                                group = Slice,
                                colour = Group))+
  geom_point(alpha=0.5, size=4)+
  facet_wrap(facets =  ~ Group)

Grammar of Graphics

ggplot(data = DT, mapping = aes(x = Pulse,
                                y = Amplitude,
                                group = Slice,
                                colour = Group))+
  geom_point(alpha=0.5, size=4)+
  geom_line()+
  facet_wrap(facets =  ~ Group)

Grammar of Graphics

ggplot(data = DT, mapping = aes(x = Pulse,
                                y = Amplitude,
                                group = Slice,
                                colour = Group))+
  geom_point(alpha=0.5, size=4)+
  geom_line()+
  facet_wrap(facets =  ~ Group)+
  scale_y_continuous(name = "Amplitude (mV)", limits = c(0,5), expand = c(0,0))+
  scale_x_discrete(name = "Pulse Number")

Grammar of Graphics

The plot structure is done!

Can we make it pretty though?

Grammar of Graphics

ggplot(data = DT, mapping = aes(x = Pulse, y = Amplitude, group = Slice, colour = Group))+
  geom_point(alpha=0.5, size=4)+
  geom_line()+
  facet_wrap(facets =  ~ Group)+
  scale_y_continuous(name = "Amplitude (mV)", limits = c(0,5), expand = c(0,0))+
  scale_x_discrete(name = "Pulse Number")+
  scale_colour_manual(values = c("WT" = "black", "KO" = "red"), name = "")

Grammar of Graphics

ggplot(data = DT, mapping = aes(x = Pulse, y = Amplitude, group = Slice, colour = Group))+
  geom_point(alpha=0.5, size=4)+
  geom_line()+
  facet_wrap(facets =  ~ Group)+
  scale_y_continuous(name = "Amplitude (mV)", limits = c(0,5), expand = c(0,0))+
  scale_x_discrete(name = "Pulse Number")+
  scale_colour_manual(values = c("WT" = "black", "KO" = "red"), name = "")+
  theme_classic()

Grammar of Graphics

ggplot(data = DT, mapping = aes(x = Pulse, y = Amplitude, group = Slice, colour = Group))+
  geom_point(alpha=0.5, size=4)+
  geom_line()+
  facet_wrap(facets =  ~ Group)+
  scale_y_continuous(name = "Amplitude (mV)", limits = c(0,5), expand = c(0,0))+
  scale_x_discrete(name = "Pulse Number")+
  scale_colour_manual(values = c("WT" = "black", "KO" = "red"), name = "")+
  theme_classic()+
  theme(strip.background = element_blank())

Grammar of Graphics

AmplitudePlot <- ggplot(data = DT, aes(x = as.factor(Pulse), y = Amplitude, group = Slice, colour = Group))+
  geom_point(alpha=0.5, size=4)+
  geom_line()+
  facet_wrap(facets =  ~ Group)+
  scale_y_continuous(name = "Amplitude (mV)", limits = c(0,5), expand = c(0,0))+
  scale_x_discrete(name = "Pulse Number")+
  scale_colour_manual(values = c("WT" = "black", "KO" = "red"), name = "")+
  theme_classic()+
  theme(strip.background = element_blank(), strip.placement = "outside")

Grammar of Graphics

PPRTable <- dcast.data.table(data = DT,
                             formula = Slice + Group ~ Pulse,
                             value.var = "Amplitude")

DT:

Slice Group Pulse Amplitude
1 WT 1 1.42
2 WT 1 0.78
3 WT 1 0.96
4 WT 1 0.64
5 WT 1 0.92
1 WT 2 2.70
2 WT 2 0.87
3 WT 2 1.64
4 WT 2 1.35
5 WT 2 1.88

PPRTable:

Slice Group 1 2
1 WT 1.42 2.70
2 WT 0.78 0.87
3 WT 0.96 1.64
4 WT 0.64 1.35
5 WT 0.92 1.88

Grammar of Graphics

PPRTable <- dcast.data.table(data = DT,
                             formula = Slice + Group ~ Pulse,
                             value.var = "Amplitude")
PPRTable[,PPR:= `2`/`1`,]

Grammar of Graphics

PPRPlot <- ggplot(data = PPRTable, aes(x = Group,
                                       y = PPR,
                                       colour = Group))+
  ggbeeswarm::geom_beeswarm(size=4, alpha=0.5, cex = 6)+
  scale_y_continuous(name = "PPR")+
  scale_x_discrete(name="", position = "top")+
  scale_colour_manual(values = c("WT" = "black", "KO" = "red"), name = "")+
  theme_classic()+
  theme(legend.position = "None", axis.line.x = element_blank(), axis.ticks.x = element_blank())

Grammar of Graphics - patchwork

library(patchwork)

AmplitudePlot + PPRPlot

Grammar of Graphics - patchwork

Grammar of Graphics - patchwork

library(patchwork)

AmplitudePlot + PPRPlot +
  plot_annotation(tag_levels = "A") +
  plot_layout(widths = c(2,1), guides='collect') & 
  theme(axis.text = element_text(size=16, colour = "black"),
        strip.text = element_text(size=16),
        axis.title = element_text(size=18),
        plot.tag = element_text(size=22))

Grammar of Graphics - patchwork

Grammar of Graphics - patchwork

library(patchwork)

AmplitudePlot + PPRPlot +
  plot_annotation(tag_levels = "A") +
  plot_layout(widths = c(2,1), guides='collect') & 
  theme(axis.text = element_text(size=16, colour = "black"),
        strip.text = element_text(size=16),
        axis.title = element_text(size=18),
        plot.tag = element_text(size=22))
  • Change tag annotation (can also recognise nested plots: A, B1, B2)

  • Change layout based on dimensions or with “design matrix”

Grammar of Graphics - patchwork

library(patchwork)

AmplitudePlot + PPRPlot +
  plot_annotation(tag_levels = "A") +
  plot_layout(widths = c(2,1), guides='collect') & 
  theme(axis.text = element_text(size=16, colour = "black"),
        strip.text = element_text(size=16),
        axis.title = element_text(size=18),
        plot.tag = element_text(size=22))
  • Change “global” plot settings applied to all plots (make them the same)

Grammar of Graphics - patchwork

library(patchwork)

CompletePanel <- AmplitudePlot + PPRPlot +
  plot_annotation(tag_levels = "A") +
  plot_layout(widths = c(2,1), guides='collect') & 
  theme(axis.text = element_text(size=16, colour = "black"),
        strip.text = element_text(size=16),
        axis.title = element_text(size=18),
        plot.tag = element_text(size=22))

Saving your Plot

ggsave(filename = "FinalPlot.pdf",
       plot = CompletePanel,
       device = "pdf",
       path = "Output/",
       width = 9,
       height = 6,
       units = "in")
  • You can save files as: eps, ps, tex (pictex), pdf, jpeg, tiff, png, bmp, svg or wmf (windows only)

  • Units: in, cm, mm, or px

Don’t worry

It is impossible to know everything, but you will find everything online or in R itself!

  • type ? before a function, use help(), or the help tab

  • great forum to find solutions stackoverflow.com

  • for ggplot2 ggplot2.tidyverse.org

  • for other packages look for “packagename vignette”

  • find cheat sheets! (Help -> Cheat Sheets)

  • Google: there is likely no problem which can’t be found

Welcome to the 2nd part

Previously

For now we just considered normally distributed data

But things are not as often normal/gaussian as we think…

Often parameters can be normally distributed (e.g. the mean) but not the data itself.

But there is a whole new world!

Welcome to the Paranormal!

Distributions

There is a large set of distributions: - beta distribution

References

Wendt, Caroline J., and G. Brooke Anderson. 2022. “Ten Simple Rules for Finding and Selecting R Packages.” Edited by Scott Markel. PLOS Computational Biology 18 (3): e1009884. https://doi.org/10.1371/journal.pcbi.1009884.

Footnotes

  1. Don’t worry about any of these. R will take care of it automatically

  2. can be a personal repository

  3. No copy of data necessary compared to using =